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Abstract. This paper presents a new method for automatically generat¬ 
ing numerical invariants for imperative programs. Given a program, our 
procedure computes a binary input/output relation on program states 
which over-approximates the behaviour of the program. It is composi¬ 
tional in the sense that it operates by decomposing the program into 
parts, computing an abstract meaning of each part, and then composing 
the meanings. Our method for approximating loop behaviour is based on 
first approximating the meaning of the loop body, extracting recurrence 
relations from that approximation, and then using the closed forms to 
approximate the loop. Our experiments demonstrate that on verification 
tasks, our method is competitive with leading invariant generation and 
verification tools. 


1 Introduction 

Compositional program analyses operate by decomposing a program into parts, 
computing an abstract meaning of each part, and then composing the mean¬ 
ings. Compositional analyses have a number of desirable properties, including 
scalability, parallelizability, and applicability to incomplete programs. However, 
compositionality comes with a price: since each program fragment is analyzed 
independently of its context, the analysis cannot benefit from contextual in¬ 
formation. This paper presents a compositional method for numerical invariant 
generation which, despite loss of contextual information, compares favourably 
with leading (non-compositional) verification techniques. 

The analysis proposed in this paper aims to compute a transition relation 
which over-approximates the behaviour of a given program. The use of transition 
relations in compositional analysis (e.g., stems from the fact 

that they can be composed: for example, consider a program P = Pi \ P2 which 
consists of two sub-programs Pi and P2 which are executed in sequence. A tran¬ 
sition invariant |P] for P can be computed by computing transition invariants 
|Pi]| and IP2I for the subprograms and then taking |P] to be the relational 
composition: |P] = {(s, s") : 3s'.(s,s') e |Pi] A {s',s") e |P2l}- 

A crucial question is how to compute abstractions of loops (i.e., loop sum¬ 
maries [HI)- Our analysis is based on a classical idea: find recurrence relations for 
variables modified in the body of a loop, and then use the closed forms for these 
recurrences as the abstraction of the loop. The focus of research on recurrence 
analysis has mainly been on computing the exact behaviour of a (necessarily) 




limited class loops, e.g. loops where the body is a sequence of affine assignments 
(see Section for a discussion of related literature). We shift the goal to com¬ 
puting over-approximate behaviour of arbitrary loops. The main novelty of our 
approach is to make synergistic use of recurrence analysis and compositionality: 
on one hand, recurrence analysis can be used to compute accurate transition 
formulas for loops; on the other hand, transition formulas for loop bodies can be 
mined for recurrence relations to enable recurrence analysis. 

Compositionality enables using recurrence analysis for arbitrary loops in two 
ways. First, the fact that the transition formula for a loop is computed from 
a transition formula for its body makes the control structure of the loop irrel¬ 
evant (e.g., whether it is a sequence of assignments or contains branching or 
nested loops - its transition formula is just a formula). Second, having access 
to a loop body formula when computing a loop summary opens the door to 
using Satisfiability Modulo Theories (SMT) solvers to extract a broad range 
semantic recurrences. In particular, our analysis is able to exploit approximate 
recurrences (inequations over linear terms) to compute interesting loop invari¬ 
ants even for variables which do not satisfy recurrence equations in the classical 
sense, thus extending the applicability of recurrence-based invariant generation 
and overcoming a major barrier in its practical use. 

In summary, this paper presents a compositional method for generating nu¬ 
merical invariants (polynomial inequalities of unbounded degree among integer 
and rational variables) for programs. The main technical contributions are as 
follows. 

1. We give a method for computing abstractions of loops using summaries for 
their bodies. This allows our analysis to apply to arbitrary code (with nested 
loops, unstructured loops, and arbitrary branching). It also makes it possible 
to use SMT solvers to extract semantic recurrence relations rather than 
syntactic recurrences obtained by pattern-matching source code. 

2. We identify a class recurrence (in)equations that can be efficiently extracted 
from loop bodies using SMT solving technology and solved using simple 
linear algebra. 

3. We give a linearization algorithm which enables tractable (but necessarily 
approximate) reasoning about non-linear formulas over rationals and integers 
(Section ID). 

4. We collect ideas from a diverse range of sources (including algebraic program 
analysis m, recurrence analysis mm, linearization [20) . and symbolic 
abstraction |2fil22II9p . and synthesize them into a cohesive presentation 
which can be used as a foundation for father research on recurrence analysis. 
We implemented linear recurrence analysis and used it to verify assertions for 

a suite of benchmarks. Linear recurrence analysis is able to prove the correctness 
of more benchmarks in this suite than any of the leading verification tools for 
integer programs. 






r := X // remainder 
q := 0 // quotient 
while(r >= y): 

// subtract y from r 
t := y 

while(t != 0) 
r : = r - 1 
t := t - 1 

q := q + 1 

assert(x = q*y + r) 


(a) Program text 



Fig. 1. An integer division program, computing a quotient and remainder. Statements 
of the form [tp] represent assumptions; i.e., statements which block if tp does not hold. 

2 Overview 

We will adopt a simple intraprocedural model in which a program is represented 
by a control flow automaton (CFA) where edges are labeled by program state¬ 
ments. Figure [T] depicts such a CFA for a program which computes the quotient 
and remainder of division of a variable x by a variable y. We use this model for 
the sake of simplicity and to help keep the presentation of our analysis short and 
self-contained. We hope that the basic idea behind the extension to procedures 
(implemented in the tool), using the analysis to compute procedure summaries 
[29] . is clear without formal explanation. 

Our analysis, linear recurrence analysis (LRA), is presented in the algebraic 
framework described in [10] . Suppose that we wish to prove that the assertion 
assert (x = q*y + r) always succeeds. We begin by computing the set of paths 
from to Ug (the location corresponding to the assert statement in the 

CFA). This set of paths is represented by a path expression for the vertex ug, 
which is a regular expression over an alphabet of control flow edges. In principle, 
this can be accomplished by Kleene’s well-known algorithm for converting a 
finite automaton into a regular expression [14] (but more efficient algorithms 
exist [SO])- For example, the following is a path expression for ug: 


Inner loop 

entr '-^ * 

Vl)-{vi,V2}-{{V2, Vs) ■(V3,V4}-{{V4, Vs) ■{V5,Ve)-(ve,V4))*-{04,07)-{07,02)) -{02,03) 

Outer loop 


Once we have a path expression representing the paths to ug, we compute an 
over-approximation of the executions to ug by evaluating the path expression in 
some abstract domain. The main benefit of this algebraic framework is that an 
analysis is defined simply by providing an interpretation for each of the regular 
expression operators (sequencing, choice, and iteration, corresponding to the 
control structures of structured programs), and then we may rely on a path 
expression algorithm f [14130] 1 to efficiently “lift” the analysis to programs with 
arbitrary control flow. 
















Formally, a program analysis (in the framework of [in]) is defined by an in¬ 
terpretation, which consists of a semantic algebra and a semantic function. A 
semantic algebra consists of a universe which defines the space of possible pro¬ 
gram meanings, and sequencing, choice, and iteration operators, which define 
how to compose program meanings. A semantic function is a mapping from con¬ 
trol flow edges to elements of the universe which defines the meaning of each 
control flow edge. A path expression is evaluated by interpreting the individual 
edges using the semantic function, and interpreting the regular expression oper¬ 
ators using the corresponding operators of the semantic algebra (to compose the 
interpretations of individual edges into interpretations of sets of program paths). 

Keeping this overall algorithm in mind, we proceed to describe the interpre¬ 
tation which defines linear recurrence analysis. 

LRA Universe. The semantic universe of LRA (i.e., the space of program 
meanings) is the set of (not necessarily linear) arithmetic transition formulas. 
If we let Var denote the set of program variables and Var^ the set of “primed” 
copies of program variables, then a transition formula is an arithmetic formula 
with free variables in Var U Var^ Such a formula represents an input/output 
relation between program states. 


LRA Semantic Function. The semantic function I-] is a function that maps 
each edge of a control flow automaton to its interpretation as a transition for¬ 
mula. For example (again, considering Figured]), we have 

I('yi,W2)l = 


r' = X A stable{{q, t, x, y}) 
q' = Q A stable{{r, t, x, y}) 


I(l^2,'C3)l 


r > y A stable{{q, r, t, x, y}) 


where for X C Var, we have stable{X) 


A 




we use this to factor 


out equalities from the formulas and make them more legible. Boxes around 
formulas have no meaning, and are used only to make it easier to distinguish 
between equalities in formulas and the meta-language. 


LRA Operators. The sequencing and choice operators of our analysis are 
defined as follows: 

ipQtj; = 3x".g)\x"lx'\ A 'f)\x"lx\ Sequencing 

© '0 = V V' Choice 

(where ip[x''/x'] denotes with each primed variable x' replaced by its double- 
primed counterpart x", and il)[x"/x\ similarly replaces unprimed variables with 
double-primed variables). 

The semantic function, sequencing, and choice operators are sufficient to 
analyze loop-free code. For example, we may consider how LRA computes a 
transition invariant for the body of the inner loop of Figured! 

|(U4,U5) • (U5,U6)] = I(l^4,-y5)l © I(^'5,^^6)l 


t>0Ar'=r — lA stable{{q, t, x, y}) 
















|{i;4,^^5) • {V5,V6) ■ (W6,'f'4)l = [(^^4, Ws) ’ (^^5, ^e)! © I(?;6,'f'4)l 

= t>OAr' = r — lAt' = t — lA stable{{q, x, y}) 
The final step in describing our analysis is to provide a definition of the 
iteration operator (®) of LRA. The idea behind the definition of the iteration 
operator is to use an SMT solver to extract recurrence relations from the loop 
body, and then use the closed form of these recurrences for the abstraction of 
the loop. We explain this in detail in Section [3l Here, we illustrate how LRA 
works on the running example to provide some intuition on the analysis. 

After computing a formula (/dinner representing 
the body of the inner loop (as given above), we 
apply the iteration operator ® to compute a for¬ 
mula representing any number of executions of 
the inner loop. The iteration operator begins by extracting the recurrence equa¬ 
tions shown to the right. It then computes closed forms for these recurrences, 
also shown to the right (where denotes the value that the variable x takes 
on the kth iteration of the loop). Note that this table omits “uninteresting” re¬ 
currences (such as g' = g -I- 0) which indicate that a variable does not change in 
a loop. These closed forms are used to abstract the loop as follows: 


Recurrence 

Closed form 

r' = r — 1 
t' = t-l 

j.(k) _ j,(0) _ ^ 

t(k) = i(0) _ 


dinner 


3k.k >0Ar' = r — kAt' = t — kA stable{{q, x, y}) 


r' = r + t'—tAt'<tA stable{{q, x, y}) 


We may use this summary V5®ner inner loop to compute a transition 

formula representing the body of the outer loop: 

V^outer = I(l^2, ^^3)1 © 1 (^ 3 . '*^4)1 © © I('^4, W)1 © 1(^7, 1^2)1 


g'=g-|-lAr' = r-|-t' — yAt' = 0Ar>yA stable{{x, y}) 


We then apply the iteration operator to 
compute a transition formula for the outer 
loop. The recurrences found for the outer loop 
and their closed forms are shown to the right 
(again, with “uninteresting” recurrences omitted). We note that our algorithm 
extracts these recurrences from gjouter using only semantic operations: the fact 
that :/3outer Is an abstraction of a looping computation is completely transpar¬ 
ent to the analysis. Using the closed forms of the recurrences to the right, we 
compute the following transition formula for the outer loop: 


Recurrence 

Closed form 

1—1 

+ 1 

II II 

qW = qW + k 

J.{k) _ ^(0) _ y(0)y^ 


® _ 

Pouter 


3k.k >0Aq' = q + kAr' = r — kyA stable{{x, y}) 


= q' > q Ar' = r — {q' — q)y A stable{{x, y}) 

Finally, we compute a transition formula which approximates all executions 
which end at vs as follows: 

‘PP = ©V^outer © 


q' > 0 A r' = X — q'y A r < y A stable{{x, y}) 



























This formula is strong enough to imply that assertion x' = q' *y' + r' holds 
at ug. This is particularly interesting because it requires proving a non-linear 
transition invariant for the loop, which is out of scope for many state-of-the-art 
program analyzers. 


3 Abstracting Loops with Linear Recnrrence Analysis 

In this section, we describe the iteration operator of linear recurrence analysis. 
Suppose that we have a formula (/3body which approximates the behaviour of the 
body of a loop. Our goal is to compute a formula which represents the 
effect of zero or more executions of the loop body. Our iteration operator works 
by extracting recurrence relations from the formula (^body and then computing 
closed forms for these relations. We present our iteration operator in three stages, 
based on the types of recurrence relations being considered: simple recurrence 
equations^ stratified recurrence equations, and linear recurrence (in) equations. 
Simple and stratified recurrences are classical classes of recurrence equations. 
Linear recurrence (in)equations generalize the class of inequations presented in 
[5] by using stratified recurrences to generate polynomial (rather than just linear) 
inequations. The main conceptual contribution of this section is the idea to use 
SMT solvers to extract recurrences (and other relevant information) from a loop 
body formula. 

In the remainder of this section, we fix a formula v?body representing the body 
of a loop. We assume that v^body is expressed in linear (rational and integer) 
arithmetic; our strategy for dealing with non-linear arithmetic is described in 
Section m We also assume that i^body is satisfiable (if it is not, then we can take 
‘^body AxeVar^' ~ which represents zero iterations of the loop). 

3.1 Simple recurrence equations 

We start by defining simple recurrences and induction variables. 

Definition 1. A simple recurrence for a formula ip is an equation of the form 
x' = X c (for a constant c) such that p 1= x' = x -h c. If x' = x c is a simple 
recurrence for p, we say that x satisfies the recurrence x' = x c, and if there 
is some c such that x satisfies the recurrence x' = x c, we say that x is an 
induction variable. 

Simple recurrences can be detected by first querying an SMT solver for a model 
m of :^body, and then asking whether :^body implies x' = x -f |x' — x]™ (where 
|x' — x|"* denotes the interpretation of the term x' — x in the model m). This 
implication holds iff x is an induction variable. 

If X is an induction variable that satisfies the recurrence x' = x c, then the 
closed form for x is x*^^^ = x*^°^ -I- kc (writing x^^'i for the value that x obtains on 
the fcth iteration of the loop). To provide some early intuition on the iteration 
operator to be developed in the remainder of this section, let us suppose that 


we are only interested in simple recurrences. Then a possible definition for the 

iteration operator is 

® A 
^body 

where SR{ipbody) is the set of simple recurrences satisfied by iphody 

The iteration operator defined above is sound (it over-approximates the be¬ 
haviour of any number of iterations of the loop, since each variable is either 
described exactly by a recurrence or is not constrained at all), but it is impre¬ 
cise. The remainder of this section discusses more general recurrence equations 
which can be used to compute more precise transition invariants for loops. 

3.2 Stratified recurrences equations 

Consider the loop shown to the right. We can see that 
X satisfies a simple recurrence equation x' = x -|- 1, and 
that y satisfies a (non-simple) recurrence equation y' = 
y -f X -f 1. A closed form for y’s recurrence is = y*^°^ -I- 

+ Since x satisfies a simple recurrence {x' = x-l-l), we have a closed 
form for x*'*^ so we may simplify this recurrence and remove the summation: 

y (C = y(0) + + * + 1) = y (0) + fcx(°) +k + ’^i = y(°) + fcx® + . 

z—0 z—0 

Stratified recurrence equations generalize this idea: starting from simple re¬ 
currence equations, we solve more and more complicated recurrences using the 
closed forms for simpler ones. As with the example above, stratified recurrences 
have non-linear closed forms. Non-linear invariant generation is not the main 
focus of our work, but it is sometimes a necessary intermediate step for prov¬ 
ing linear invariants in a compositional setting: since our analysis cannot take 
advantage of contextual information when analyzing a loop, we generate a non¬ 
linear invariant and then, after the analysis has examined more context, simplify 
it (using the linearization algorithm from Section Id]). 

Definition 2. Let ip be a formula. The stratified recurrence equations (and strat¬ 
ified induction variables) of ip are defined inductively as: 

— A simple recurrence equation which is satisfied by (p is a stratified recur¬ 
rence equation of ip (and a simple induction variable is a stratified induction 
variable) at stratum 0. 

— Let y denote a vector of the stratified induction variables of strata < N . A 
recurrence of the form x' = x -\- cy (where c is a vector of constants) is a 
stratified recurrence at stratum N -\- 1 (and if x satisfies such a recurrence, 
it is a stratified induction variable at stratum N -\-1). 

We use siv{ip) to denote the set of all stratified induction variables of ip. 

Let us now discuss how stratified recurrences are detected from a loop body 
formula (fbody We begin by computing the affine hull aff{ipbody) of v^body (Algo- 


whileCx < 101 : 

X := X + 1 

y := y + X 

z := 2 * X 


3k > 0. /\{x' = x 3- kc : x' = X -\- c € SR{iphody)} 






Algorithm 1: Affine hull. 

Input : Satisfiable formula (pbody 
Output: Affine hull of v?body 
H ^ _L, Ip ^ V^body 7 
while there exists a model m of ip do 
H' ^ l\{x = [a;]"* : a; e Var U Var'}; 

H i— H H' ; /*Join in the domain of linear equalities*/ 

Ip Ip A 
end 

return H _ 

rithm [T|) 0 

Definition 3. The affine hull aff{ip) of a formula ip is the smallest affine set 
which contains (p, represented as (the set of solutions to) a system of equations 
Ax — b, where x = [xi ■ ■ ■ Xn x[ ■ ■ ■ . Logically, aff{ip) is a system of equa¬ 

tions which satisfies the following three properties: (1) (p \= ajf{(p), (2) every 
linear equation over VarU Var which is implied by ip is also implied by aff{ip), 
and (3) no equation in ajj(ip) is implied by the others. 

Our strategy for detecting stratified recurrences is based on the following 
lemma. Combined with property (2) of aff{ipt,ody) above, this lemma implies 
that any equation implied by v^body can be expressed as a linear combination of 
the equations in aff{ipbody)- 

Lemma 1 ([28], Corollary 3.Id). Let A be a matrix, b be a column vector, 
c be a row vector, and d be a constant. Assume that the system Ax = b has 
a solution. Then Ax = b implies cx = d iff there is a row vector A such that 
\A = c and \b = d. 

Let us write aff{ipbody) as Ax = b. Suppose that we have detected all re¬ 
currences of strata < N, and that we want to determine whether a variable Xi 
(0 < i < n) is an induction variable at stratum N. Then we ask whether there 
exists A, c, and d such that: 

— XA — c and Xb = d (i.e, cx = d is implied by afflppbody) and thus by :^body) 

— Ci = 1 and Ci+n = — 1 (the coefficients of Xi and x[ are 1 and -1, respectively) 

— For all j such that j i n and n < j < 2n, Cj = 0 (except for x', all 
coefficients of primed variables are 0). 

— For all j such that j i such that Xj is not an induction variable of strata 

< N and n < j < 2n, Cj = 0 (except for Xi and induction variables of strata 

< N, all coefficients for unprimed variables are 0). 

Thus, after computing the affine hull of :/7body, determining whether a given 
variable satisfies a stratified recurrence is simply a matter of solving a system of 
linear equations (e.g., using Gaussian elimination). 


^ This algorithm is a specialization of the one in [26] to the abstract domain of linear 
equalities. 







Closed forms for stratified recurrences. We first state a lemma: 


Lemma 2. The closed form for a stratified induction variable of strata N is of 
the form 

= po{k) +pi{k)y^°^ +■ ■ ■+pnik)y^^ 

where each yi is a stratified induction variable of strata < N and each pi(k) € 
Q[fc] is a polynomial of one variable with rational coefficients. 

Our algorithm for solving stratified recurrences is based on a constructive 
proof for this lemma. We proceed by induction on strata. The base case is trivial. 
Suppose that we have a recurrence at strata N (and all yi,..., j/„ are of strata 

< TV): x' = x+ciyi-\ -l-c„i/„+6. Then we may write (cij/J*^ + 

• ■ • + Cnyf +bj. By our induction hypothesis, each yj can be written as a linear 
term with coefficients from Q[fc]. It follows that there exists po, ...,pn € Q[fc] so 
that 

ciy't^ H-h c„j/W b=poii) +piii)y^i^ H-f 

Thus we have 

k-l 

x^^l = +'^po{i) +Pi{i)y^°^ H- 

i^O 

= + ^Po(*) + +• • • + 2/1°^ '^Pn(i) 

i —0 2—0 2—0 

The closed form of a summation of a polynomial of degree m is a polynomial 
of degree m + 1. We can find this polynomial via curve fitting (i.e., we compute 
the first m + 1 terms of the summation and then solve the corresponding linear 
system of equations for the coefficients of the polynomial). 

3.3 Linear recurrence (in)equations 

Recurrence equations (such as the simple and 
stratified varieties) yield very accurate approxi¬ 
mations for some variables, but what about vari¬ 
ables which do not satisfy any recurrence equa¬ 
tion? For example, consider that neither x nor y satisfy a recurrence equation in 
the loop to the right. However, they do satisfy recurrence inequations: x— 1 < x', 
x' < X, y — 1 < y', and y' < y. These inequations can be closed to yield 
x(°^ — k< x*^*) and x^^^ < x^^), —k< y^^\ and y^^'^ < y^^\ In this section, we 

discuss linear recurrence (in)equations, which allow us to compute good approx¬ 
imations for loops that cannot be completely described by recurrence equations. 

Definition 4. A linear recurrence (in)equation of a formula (p is an (in)equation 
which is implied by ip and which is of the form 

ex' ex + by + d 

where cxi G {<,<,=}; x is any vector of variables, y is a vector of stratified 
induction variables in pbody, c, b are constant vectors, and d is a constant. 


while (x > 0 A y > 0): 
if (*) : X := X - 1 
else: y := y - 1 




Linear recurrence (in)equations generalize recurrence equations in two ways: 
first, they allow for inequalities rather than equations. Second, they allow re¬ 
currences for linear terms, rather than just variables. For example, the linear 
recurrence equation (x' -|- y') = (x -f y) -|- 1 is satisfied by the body of the loop 
above, which can be closed to yield -|- y^^^) = (x(°^ + y(°)) + fc. 

We now describe our method for detecting and solving linear recurrence 
(in)equations. We begin by introducing a set of difference variables one for 
each variable x ^ siv((/9body) (variables which do belong to siv((/9body) are already 
precisely described by recurrence equations, so we need not approximate them). 
We then compute (via Algorithmic) the convex hull of the formula ip defined as: 

Ip = 3Ar.(/?body A f\{Sx = x' - X : X € Var \ siv(y)body)} 
where X is Var' U (Var \ siv((^body))- 


Algorithm 2: Convex hull. 

Input : Satisfiable formula ip, set of variables X 
Output: Convex hull of 3X.ip 
P ■<— _L; 

while there exists a model m of ip do 

Let Q be a cube of the DNF of ip s.t. m |= Q; 

Q project(Q, X) ; /*Polyhedral projection*/ 

P ■«— P U Q ; /*Polyhedral join*/ 

Ip Ip A -^P; 

end 

return P 


Geometrically, the convex hull hulfipbody) is the smallest convex polyhedron 
which contains (^body- Logically, it is a set of (in)equations such that (1) every 
(in)equation in hull{iphody) is implied by Lpbody, and (2) any linear (in)equation 
(over Var U Var) which is implied by (/?body is also implied by hull{ipbody)■ For 
example, hull{(pbody) for the loop above is: 

0<^a;A(52,<lA0<dyAdj,<lA(5a;-|-(5y = l 
We note that the only variables which appear in the (in)equations in hull{ipbody) 
are (stratified) induction variables and difference variables. Thus, we may write 
any (in)equation in hull{tpbody) as c<5 t<iby + d (where 5 is the vector of difference 
variables, y is the vector of stratified induction variables, c and b are constant 
vectors, and d is a constant). Recalling the definition of the difference variables, 
we may rewrite such an inequation as c{x' — x) cxi by d and then rewrite 
again as ex' cx by + d, which matches the definition of linear recurrence 
(in)equations given in Definition |4l 

We may close such a linear recurrence (in)equation as follows: 

k-l 

cx^^^ [XI -f ^ by^*^ + d 

i=0 






We can compute a closed form for the summation J2i=o + d as in the 
preceding section. 


3.4 Loop guards 

A loop body typically contains crucial information about the execution of the 
loop that cannot be captured by recurrence relations. For example, consider the 
loop in Sectioning Supposing that the loop executes n times, we must have that 
x(*^) < 10 for each k < n. Further, consider that the variable z is a function of 
the simple induction variable x, and so can be described precisely in terms 
of the pre-state variables (even though it does not itself satisfy any recurrence): 


z(o) 

2(x(o) +k + l) 


if k — 0 
otherwise. 


The question is: how can we recover this type of information from a loop body 
formula? 

We define the guard of a transition formula (p as follows: 
guard{p) = 

If y; is a loop body formula, then guardipp) is a formula which over-approximates 
the effect of executing at least one execution of the loop. Intuitively, (3Var.(/?) as 
a precondition that must hold before every iteration of the loop and (dVar'.tp) 
as a post-condition of the loop that must hold after each iteration. 

Consider again the example loop in Section 13.21 we have the following loop 
body formula 


(3Var.(^) A (3Vab.(/j) 


^body — 


a:<10Aa;' = a;-|-lA?/' = 2 /-|-x'Az' = 2x' 


We compute guard{pbody) as follows: 


guard{pt,ody) 


(3x,y, z.(^body) A (3xbybzb(^body) 
(x < 10) A (x’ < 11 A z = 2x') , 


and thereby recover the desired information about x and z. 

Since loop body formulas may be large, it may be adventageous in practice 
to simplify the guard formula by eliminating the quantifiers (as we did above). 
A second option, which is more efficient but less precise, is to over-approximate 
quantifier elimination. Two possibilities are to use Algorithm to compute the 
convex hull of guard{pbody), or to use optimization modulo theories [19] to com¬ 
pute intervals for each pre- and post-state variable in :^body 


3.5 Bringing it all together 

We close this section by describing how the pieces defined in this section ht into 
the iteration operator of linear recurrence analysis. We let CR{pbody) denote 
the set of closed linear recurrence (in)equations (including simple and stratified 
recurrence equations) satisfied by ptody- Each such (in)equation is of the form 












[XI t, where the free variables of t are drawn from : x G Var} and a 

distinguished variable A: ^ Var indicating the loop iteration. We define 

+ A 
^body 

where i—>• x] denotes the term t with every variable of the form is 

replaced by the corresponding variable x. 

Finally, our iteration operator is defined as: 

® A 
^body 


^ guard{ipb 0 dy)) V A^eVar 


dfc.fc > 1 A /\{cx' [X] x] : ex' t<it £ CR{ipt,ody)} 


4 Linearization 

The iteration operator presented in the previous section relies heavily on using 
an SMT solver to extract information from loop body formulas. This strategy 
requires that loop body formulas are expressed in a decidable theory which is 
supported by SMT solvers (in particular, linear arithmetic). However, a pro¬ 
gram may contain non-linear instructions, and even if it does not, our iteration 
operator may introduce non-linearity (consider Example [TJ where the transition 
formula for the outer loop contains the non-linear proposition r' = x — q'y). 
Our solution to this problem is to linearize non-linear formulas before passing 
them to the iteration operator. 

Linearization is an operation that, given an (arbitrary) arithmetic formula 
(/?, computes a formula lin{(p) which over-approximates (p (i.e., (p => lin{(p)), but 
which is expressed in linear arithmetic. There is generally no best approximation 
of a non-linear formula as a linear formula, so our method is (necessarily) a 
heuristic. 

We explain our linearization algorithm informally using an example. Consider 
the following non-linear formula (where w, x, y, z are integers): 

tj;=l<w = x<y< 5 Aw*y<z<x*y 

Our algorithm begins by normalizing ip, separating it into a linear part and a set 
of non-linear equations (introducing existentially quantified temporary variables 
as necessary). For example, the result of normalizing ip is: 

(l<t(; = a;<?/<5A<7o<z< 7i) A (70 = w * y A = x * y) 

The left conjunct is a linear over-approximation of ip, but it is very imprecise: 
semantically equal (but syntactically distinct) non-linear terms become seman¬ 
tically unequal in the over-approximation, and all information about the mag¬ 
nitude of non-linear terms is lost. To increase precision of this approximation, 
we use two strengthening steps. 

1. We replace the non-linear operations with uninterpreted function symbols 
and then compute the affine hull of the resulting formula to infer equalities 
between non-linear terms. For our example ip, the we discover that 70 = 71- 

2. We compute concrete and symbolic intervals for non-linear terms. Consider 
7o = X *y from our example ip. We first compute concrete {x £ [1,3] and 
y £ [2,4]) and symbolic (x £ {x,x] and y £ [y,y]) intervals for the operands 




X and y, using symbolic optimization [H] to compute the concrete intervals. 
We obtain a concrete interval ioi x * y {x * y € [2,12]) by multiplying the 
concrete intervals of its operands. We obtain symbolic intervals for x * y 
(x * y G [y, 3 y] and x *y G [2a:, 4x]) by multiplying the concrete interval 
for x by the symbolic interval for y and vice-versa. As a result of interval 
computation, we discover: 2 < 71 < 12 A y < 71 < 3j/ A 2a; < 71 < 4a; 

Finally, we take lini^jj) to be the initial coarse linear approximation of ip conjoined 
with the facts discovered by the two strengthening steps. 

We expect linearization to have broad applications outside of the context in 
which we presented it, particularly in program analysis, where over-approximation 
can be tolerated but non-linear terms cannot. Finding improved linearization 
heuristics is an interesting direction of future work. 


5 Experiments 


We wrote a tool which implements LRA and analyzes C code (using the CIL 
[23] frontend) 0 We use Z3 m to resolve SMT queries that result from applying 
the iteration operator and checking assertion violations. Polyhedra operations 
are passed to the New Polka library implemented in Apron |3]. The quantifier 
elimination algorithm from | 22 j is used to compute loop guards. 

We tested two different configurations of LRA: one which is fully compo¬ 
sitional (LRA-Comp) and does not take advantage of contextual information, 
and one (LRA) which uses an intraprocedural polyhedron analysis [ 8 ] to gain 
some contextual information, but which is otherwise compositional. We compare 
LRA’s performance against the state-of-the-art invariant generation and verifi¬ 
cation tools CPAChecker (overall winner of the 2015 Software Verification 
Competition) and SeaHorn (winner of the loops category among tools which 
are sound for verification). 

To evaluate the precision of LRA we used it to verify the correctness of a 
suite of 119 small loop benchmarks of varying difficulty. Our benchmark suite 
was drawn from the loops category of the 2015 Software Verification Competition 
(SVComp-15), as well as a set of non-linear benchmarks (Non-linear), such as the 
one in Figured) The results for the 81 safe, integer-only benchmarks from these 
suites are shown in Tabled] The suite also contains 38 unsafe benchmarks: LRA 
and LRA-Comp have no false negatives on these benchmarks; CPAChecker 
has 3 and SeaHorn has 2. 

Our results demonstrate that LRA is an effective invariant generation algo¬ 
rithm. Even the fully compositional variant of LRA (LRA-Comp) is able to 
prove safety for 80% of the benchmarks we considered). We also note that there 
are 8 benchmarks for which LRA can prove safety but which CPAChecker and 
SeaHorn cannot. 


The tool and benchmarks are available at http://cs.toronto.edu/~zkincaid/lra 
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Benchmark suite 

^ Bench 

LRA 

LRA-Comp 

CPAChecker 

SeaHorn 

SVComp-15 

74 

65 

60 

37 

65 

Non-linear 

7 

6 

5 

1 

3 

Total 

81 

71 ( 88 %) 

65 (80%) 

38 (47%) 

68 (85%) 

Running time 

across all benchmark suites 


Mean 


5.4s 

3.0s 

42.4s 

37.7s 

Median 


0 . 8 s 

0 . 8 s 

1 . 6 s 

0 . 2 s 


Table 1. Experimental results. 


6 Related work 

There is a great deal of work on compositional invariant generation and acceler¬ 
ation which is related to the technique described in this paper. In this section, 
we compare our technique to a sampling of this work. 

Recurrence analysis. The idea of using closed forms of recurrence relations to 
approximate loops has appeared in a number of other papers. Generally speak¬ 
ing, our work differs from previous work in two essential ways: first, we use an 
SMT solver to extract semantic recurrences, rather than syntactic recurrences. 
Second, we consider approximate recurrences (inequations over linear terms) 
rather than exact recurrences (equations over variables). A survey of some of 
this work follows. 

Ammarguellat and Harrison present a method for detecting induction vari¬ 
ables which is compositional in the sense that it uses closed forms for inner loops 
in order to recognize nested recurrences [T]. Maps from variables to symbolic 
terms (effectively a symbolic constant propagation domain) is used as the ab¬ 
stract domain. Kovacs presents a technique for discovering invariant polynomial 
equations based on solving recurrence relations m- The simple and stratified 
recurrence equations considered in this paper are a strict subset of the recur¬ 
rences considered in m, but our algorithm for solving recurrences is simpler. 
Kroening et al. m presents a technique for computing wnder-approximations 
of loops which uses polynomial curve-fitting to directly compute closed forms 
for recurrences rather than extracting recurrences and then solving them in a 
separate step. 

Ancourt et al. present a method for computing recurrence inequations for 
while loops with afSne bodies [5] . Like the method we present on Section 13.31 
their method is based on using difference variables and polyhedral projections. 
Our method generalizes this work by (1) extending it to arbitary control flow, 
with (possibly non-linear) formulas as bodies rather than affine transformations, 
(2) integrating recurrence inequations with stratified induction variables, thereby 
allowing enabling the computation of invariant polynomial inequations. Ancourt 
et al. briefly discuss a method for computing invariant polynomial inequations, 
but it is based on higher-order differences rather than stratified recurrence in¬ 
equations. For example, in Figure (TJ the analysis discussed in [5] would be able 
to prove that r is decremented by a constant amount at every loop iteration, 
but could not prove that the constant amount is exactly y. 

















Acceleration. Acceleration is a technique closely related to recurrence analysis 
that was pioneered in infinite-state model checking mm, and which has re¬ 
cently found use in program analysis [mus]. Given a set of reachable states 
and an affine transformation describing the body of a loop, acceleration computes 
an exact post-image which describes the set of reachable states after executing 
any number of iterations of the loop (although there is recent work on abstract 
acceleration uses computes over-approximate post-images |12I13] 1. In contrast, 
our technique is approximate rather than exact, and computes loop summaries 
rather than post-images. A result of these two features is that our analysis to 
be applied to arbitrary loops, while acceleration is classically limited to simple 
loops where the body consists of a sequence of assignment statements. 
Compositional program analysis. Compositional program analysis has a 
long history. Particular examples are interprocedural analyses based on summa¬ 
rization [29) and elimination-style dataflow analyses (a good overview of which 
can be found in m)- The following surveys recent work on compositional anal¬ 
ysis for numerical invariants. 

Kroening et al. and Biallas et al. present compositional analysis tech¬ 
niques based on predicate abstraction. In addition to predicate abstraction, there 
are a few papers which use numerical abstract domains for compositional anal¬ 
ysis. These include an algorithm for detecting affine equalities between program 
variables [23) , an algorithm for detecting polynomial equalities between program 
variables [7], a disjunctive polyhedra analysis which uses widening to compute 
loop summaries |25] , and a method for automatically synthesizing transfer func¬ 
tions for template abstract domains using quantifier elimination [21) . Our ab¬ 
stract domain is the set of arbitrary arithmetic formula, which is more expressive 
than these domains, but which (as usual) incurs a price in performance. It would 
be interesting to apply abstractions to our formulas to improve the performance 
of our analysis. 

Linearization. Our linearization algorithm was inspired by Mine’s procedure 
for approximating non-linear abstract transformers [20) . Mine’s procedure ab¬ 
stracts non-linear terms by linear terms with interval coefficients using the ab¬ 
stract value in the pre-state to derive intervals for variables. Our algorithm ab¬ 
stracts non-linear terms by sets of symbolic and concrete intervals, and applies 
to the more general setting of approximating arbitrary formulas. 


7 Conclusion 

This paper presents a fully compositional algorithm for generating numerical 
invariants of imperative programs. Our method for abstracting loops makes es¬ 
sential use of compositionality: we assume that we are given a formula which 
approximates the body of a loop, and we use an SMT solver to extract recur¬ 
rence relations and then use the closed forms of these recurrences to approximate 
the loop. We have demonstrated experimentally that our method is competitive 
with leading invariant generation and verification tools. 








References 


1. Z. Ammarguellat and W. L. Harrison, III. Automatic recognition of induction 
variables and recurrence relations by abstract interpretation. PLDI, pages 283- 
295, 1990. 

2. C. Ancourt, F. Coelho, and F. Irigoin. A modular static analysis approach to affine 
loop invariants detection. Electron. Notes Theor. Comput. Sci., 267(1):3-16, Oct. 
2010 . 

3. S. Bardin, A. Finkel, J. Leroux, and P. Schnoebelen. Flat acceleration in symbolic 
model checking. In ATVA, pages 474-488. 2005. 

4. J. Bertrand and A. Mine. Apron: A library of numerical abstract domains for 
static analysis. In CAV, pages 661-667, 2009. 

5. S. Biallas, J. Brauer, A. King, and S. Kowalewski. Loop leaping with closures. In 
SAS, pages 214-230, 2012. 

6. B. Boigelot and P. Wolper. Symbolic verihcation with periodic sets. In CAP, pages 
55-67. 1994. 

7. M. A. Colon. Approximating the algebraic relational semantics of imperative pro¬ 
grams. In SAS, pages 296-311. 2004. 

8. P. Cousot and N. Halbwachs. Automatic discovery of linear restraints among 
variables of a program. In POPL, pages 84-96, 1978. 

9. L. De Moura and N. Bjprner. Z3: an efficient SMT solver. TACAS, pages 337-340, 
2008. 

10. A. Farzan and Z. Kincaid. An algebraic framework for compositional program 
analysis. CoRR, abs/1310.3481, 2013. 

11. A. Finkel and J. Leroux. How to compose Presburger-accelerations: Applications 
to broadcast protocols. In FST TCS, pages 145-156, 2002. 

12. L. Gonnord and N. Halbwachs. Combining widening and acceleration in linear 
relation analysis. In SAS, pages 144-160. 2006. 

13. B. Jeannet, P. Schrammel, and S. Sankaranarayanan. Abstract acceleration of 
general linear loops. In POPL, pages 529-540, 2014. 

14. S. Kleene. Representation of events in nerve nets and finite automata. In C. Shan¬ 
non and J. McCarthy, editors. Automata Studies, pages 3-42. Princeton University 
Press, Princeton, N.J., 1956. 

15. L. Kovacs. Reasoning algebraically about P-solvable loops. In TACAS, pages 
249-264. 2008. 

16. D. Kroening, M. Lewis, and G. Weissenbacher. Under-approximating loops in C 
programs for fast counterexample detection. In CAV, pages 381-396. 2013. 

17. D. Kroening, N. Sharygina, S. Tonetta, A. Tsitovich, and C. Wintersteiger. Loop 
summarization using abstract transformers. In ATVA, pages 111-125. 2008. 

18. J. Leroux and G. Sutre. Accelerated data-flow analysis. In SAS, pages 184-199, 
2007. 

19. Y. Li, A. Albarghouthi, Z. Kincaid, A. Gurfinkel, and M. Chechik. Symbolic 
optimization with SMT solvers. In POPL, pages 607-618, 2014. 

20. A. Mine. Symbolic methods to enhance the precision of numerical abstract do¬ 
mains. In VMCAI, pages 348-363, 2006. 

21. D. Monniaux. Automatic modular abstractions for linear constraints. In POPL, 
pages 140-151, 2009. 

22. D. Monniaux. Quantifier elimination by lazy model enumeration. In CAV, pages 
585-599, 2010. 



23. M. Muller-Olm and H. Seidl. Precise interprocedural analysis throngh linear alge¬ 
bra. POPL, pages 330-341, 2004. 

24. G. C. Necula, S. McPeak, S. P. Rahnl, and W. Weimer. CIL: Intermediate langnage 
and tools for analysis and transformation of C programs. In (7(7, pages 213-228, 
2002 . 

25. C. Popeea and W.-N. Chin. Inferring disjunctive postconditions. ASIAN, pages 
331-345, 2007. 

26. T. W. Reps, S. Sagiv, and G. Yorsh. Symbolic implementation of the best trans¬ 
former. In VMCAI, pages 252-266, 2004. 

27. B. G. Ryder and M. C. Pauli. Elimination algorithms for data flow analysis. ACM 
Comput. Surv., 18(3):277-316, Sept. 1986. 

28. A. Schrijver. Theory of Linear and Inteqer Proqramminq. John Wiley & Sons, Inc., 
New York, NY, USA, 1986. 

29. M. Sharir and A. Pnueli. Two approaehes to interprocedural data flow analysis, 
chapter 7, pages 189-234. Prentice-Hall, Englewood Cliffs, NJ, 1981. 

30. R. E. Tarjan. Fast algorithms for solving path problems. J. ACM, 28(3):594-614, 
July 1981. 



